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We present a new theory that takes internal dynamics of proteins into account to 
describe forced-unfolding and force-quench refolding in single molecule experiments. 
In the current experimental setup (Atomic Force Microscopy or Laser Optical Tweez- 
ers) the distribution of unfolding times, P(t), is measured by applying a constant 
stretching force f"s from which the apparent fs-dependent unfolding rate is obtained. 
To describe the complexity of the underlying energy landscape requires additional 
probes that can incorporate the dynamics of tension propagation and relaxation of 
the polypeptide chain upon force quench. We introduce a theory of force correlation 
spectroscopy (FCS) to map the parameters of the energy landscape of proteins. In 
the FCS the joint distribution P(T, t) of folding and unfolding times is constructed 
by repeated application of cycles of stretching at constant fg separated by release 
periods T during which the force is quenched to fQ<ig. During the release period, 
the protein can collapse to a manifold of compact states or refold. We show that 
P(T, t) at various fg and {q values can be used to resolve the kinetics of unfolding 
as well as formation of native contacts. We also present methods to extract the 
parameters of the energy landscape using chain extension as the reaction coordinate 
and P(T,t). The theory and a worm-like chain model for the unfolded states allows 
us to obtain the persistence length L and the fg-dependent relaxation time, that 
gives an estimate of collapse timescale at the single molecular level, in the coil states 
of the polypeptide chain. Thus, a more complete description of landscape of protein 
native interactions can be maped out if unfolding time data are collected at several 
values of ig and fg. We illustrate the utility of the proposed formalism by analyz- 
ing simulations of unfolding-refolding trajectories of a coarse-grained protein (SI) 
with /3-sheet architecture for several values of fg, T and fg=0. The simulations of 
stretch-relax trajectories are used to map many of the parameters that characterize 
the energy landscape of SI. 
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I. INTRODUCTION 



Several biological functions are triggered bv mechanical force. These include stretching and 
contraction of muscle proteins such as titin 0, 0] , rolling and tethering of cell adhesion molecules 
0. 0. IE IE translocation of proteins across membranes [1, 0, [HI [HI , and unfoldase activity 
of chaperonins and proteasomes. Understanding these diverse functions requires probing the 
response of biomolecules to applied external tension. Dynamical responses to mechanical force 
can be used to characterize in detail the free energy landscape of biomolecules. Advances 
in manipulating micron-sized beads attached to single biomolecules have made it possible to 
stretch, twist, unfold and even unbind proteins using forces on the order of tens of piconewtons 
[l2l Il3l lij . Single molecule force spectroscopy on a number of different systems has allowed 



us to obtain a glimpse of the unbinding energy landscape of biomolecules and protein-protein 
complexes 3, lg, 12, 3- m AFM experiments, used to unfold proteins by force, one end of a 



protein is adsorbed on a template and a constant or a time-dependent pulling force is applied to 
the other terminus HI HI 21, HI HI, HI HI- By measuring the distribution of forces required 
to completely unfold proteins and the associated unfolding times, the global parameters of the 
protein energy landscape can be estimated |2a |27l |28|, |29|, |3Cj, |3l| . These insightful experiments 



when combined with theoretical studies [32J, |33|, |34( can give an unprecedented picture of forced- 
unfolding pathways. 

Current experiments have been designed primarily to obtain information on forced-unfolding 
of proteins and do not probe the reverse folding process. Although force-clamp AFM techniques 
have been used recently to probe (re)folding of single ubiquitin polyprotein |2J], the lack of 
theoretical approaches has made it difficult to interpret these pioneering experiments (3^, 36| . 



Secondly, the resolution of multiple timescales in protein folding and refolding requires not 
only novel experimental tools for single molecule experiments but also new theoretical analysis 
methods. Minimally, unfolding of proteins by a stretching force is is described by the global 
unfolding time Tfc^fs), timescales for propagation of the applied tension, and the dynamics 
describing the intermediates or "protein coil" states. Finally, if the external conditions (loading 
rate or the magnitude of f$) are such that these processes can occur on similar timescales then 
the analysis of the data requires new theoretical ideas. 

For forced unfolding the variable conjugate to f^, namely, the protein end-to-end distance X 
is a natural reaction coordinate. However, X is not appropriate for describing protein refolding 
which, due to substantial variations in the duration of folding barrier crossing, may range from 
milliseconds to few minutes. To obtain statistically meaningful distributions of unfolding times, 
a large number of complete unfolding trajectories must be recorded which requires repeated 
application of the pulling force. The inherent heterogeneity in the duration of folding and 
the lack of correlation between evolution of X and (re)folding progress creates "initial state 
ambiguity" when force is repeatedly applied to the same molecule. As a result, the interpretation 
of unfolding time data is complicated especially when the conditions are such that reverse folding 
process at the quenched force fg can occur on a long timescale, rp(fg). 

Motivated by the need to assess the effect of the multiple timescales on the energy landscape 
of folding and unfolding, we develop a new theoretical formalism to describe correlations between 
the various dynamical processes. Our theory leads naturally to a new class of single molecule 
force experiments, namely, the force correlation spectroscopy (FCS) which can be used to study 
both forced unfolding as well as force-quenched (re)folding. Such studies can lead to a more 
detailed information on both kinetic and dynamic events underlying unfolding and refolding. In 
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the FCS, cycles of stretching (f s ) are separated by periods T of quenched force f<3<fs during 
which the stretched protein can relax from its unfolded state Xjj to coil state Xq or even (re)fold 
to the native basin of attraction (NBA) state. The two experimental observables are X and the 
unfolding time t. The central quantity in the FCS is the distribution of unfolding times P(T, t) 
separated by recoil or refolding events of duration T. The higher order statistical measure 
embedded in P(T,t) is readily accessible by constructing a histogram of unfolding times for 
varying T and does not require additional technical developments. The crucial element in the 
proposed analysis is that P(T,t) is computed by averaging over final (unfolded) states, rather 
than initial (folded) states. This procedure removes the potential ambiguity of not precisely 
knowing the initial distribution of conformations in the NBA. Despite the uniqueness of the 
native state there are a number of conformations in the NBA that reflect the fluctuations of the 
folded state. The proposed formalism is a natural extension of unbinding time data analysis. 
Indeed, P(T,t) reduces to the standard distribution of unfolding times P(t) when T exceeds 
protein (re)folding timescale tf{$-q)- 

The complexity of the energy ladscape of proteins demands FCS and the theoretical anal- 
ysis. Current single molecule experiments on poly-Ub or poly-Ig27 (performed in the T-^oo 
regime) show that in these systems unfolding occurs abruptly in an apparent all-or-none man- 
ner or through a dominant intermediate [32|. On the other hand, refolding upon force-quench 
is complex and surely occurs though an ensemble of collapsed coiled states A number 

of timescales characterize the stretch-release experiments. These include besides tf(£q), the 
fs-dependent unfolding time, and the relaxation dynamics in the coiled states {C} upon force- 
quench Td(fg). In addition, if we assume that X is an appropriate reaction coordinate then the 
location of the NBA, {C}, the transition state ensembles and the associated widths are required 
for a complete characterization of the underlying energy landscape. Most of these parameters 
can be extracted using the proposed FCS experiments and the theoretical analysis presented 
here. 

In a preliminary study [37| , we reported the basics of the theory which was used to propose a 
new class of single molecule force spectroscopy methods for deciphering protein-protein interac- 
tions. The current paper is devoted to further developments in the theory with appplication to 
forced-unfolding and force-quench refolding of proteins. In particular, we illustrate the efficacy 
of the FCS by analyzing single unfolding-refolding tra ject ories generated for a coarse-grained 
model (CGM) protein 51 with /3-sheet architecture 38[ 3^. We showed previously that forced- 



unraveling of SI, in the limit of T—>oo, can be described by an apparent "two-state" kinetics 
[40L l4l| . The thermodynamics and kinetics observed in 5*1 is a characteristic of a number of 
proteins where folding/unfolding fits well two-state behavior ji^]. Thus, SI serves as a useful 
model to illustrate the efficacy of the FCS. Here, we show that by varying T and the magni- 
tude of the stretching (fs or fg), the entire dynamical processes, starting from the NBA to the 
fully stretched state, can be resolved. In the process we establish that P(T,t) which can be 
measured using AFM or laser optical tweezer (LOT) experiments, provides a convenient way of 
characterizing the energy landscape of biomolecules in detail. 



II. MODELS AND METHODS: 



Theory of force correlation spectroscopy (FCS): In single molecule atomic force microscopy 
(AFM) experiments used to unfold proteins by force, the N-terminus of a protein is anchored at 
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the surface and the C-terminus is attached to the cantilever tip through a polymer linker (Figure 
1). The molecule is stretched by displacing the cantilever tip and the resulting force is measured. 
From a theoretical perspective it is more convenient to envision applying a constant stretching 
force fg—fgx i n the x-direction (Figure 1). The free energy in the constant force formulation is 
related to the experimental setup by a Legendre transformation. More recently, it has become 
possible to apply a constant force in AFM or laser or optical tweezer (LOT) experiments to the 
ends of a protein. With this setup the unfolding time for the end-to-end distance X to reach 
the contour length L can be measured for each molecule. For a fixed ig, repeated application of 
the pulling force results in a single trajectory of unfolding times (ti, t 2 , t 3 , . . ., Figure 1) from 
which the histogram of unfolding times P(t) is obtained. The fg-dependent unfolding rate Kjj is 
obtained by fitting a Poissonian formula Ky exp [—Kjjt] to the kinetics of population of folded 
states pf which is related to P(t) as pp(t) = l—J dsP(s). 

Because Ku is a convolution of several microscopic processes, it does not describe unfolding 
in molecular detail. For instance, mechanical unfolding of fibronectin domains Fnlll involves 
the intermediate "aligned" state |27( with partially disrupted hydrophobic core which cannot be 
resolved by knowing only Kjj. Even when the transition from the folded state F to the globally 
extended state U [27[ does not involve parallel routes as in Figure 2, or multistate kinetics, 
the force-induced unfolding pathway must involve formation of intermediate coiled states {C}. 
The subsequent transition from {C} results in the formation of the globally unfolded state U. 
The incomplete time resolution prevents current experiments from probing the signature of the 
collapsed states. To probe the contributions from the underlying {C} states to global unfolding 
requires sophisticated experiments that can resolve contributions from dynamic events under- 
lying forced unfolding. We propose a novel experimental procedure which, when supplemented 
with unfolding time data analysis described below, allows us to separately probe the kinetics 
of native interactions and the dynamics of the "protein coil" (i.e. the dynamics of end-to-end 
distance X when the native contacts are disrupted). 

Consider an experiment in which stretching cycles (triggered by applying ig) are interrupted 
by relaxation intervals T during which force is quenched to ig<fg. In the time interval T, the 
polypeptide chain can relax into the manifold {C} or even refold to the native state F if T is long 
enough. If fg>fc and fQ<ic where fc is the e quil ibrium critical unfolding force at the specific 
temperature (see phase diagram for SI in Ref. |40|), these transformations can be controlled by 
T. In the simplest implementation we set f<2=0. The crucial element in the FCS experiment is 
that the same measurements are repeated for varying T. In the FCS the unfolding times are 
binned to obtain the joint histogram P(T, t) of unfolding events of duration t generated from 
the recoil manifold {C} or the native basin of attraction (NBA) or both, depending on the 
duration of the relaxation time T. In the current experiments T^oo. As a result, the dynamics 
of additional states in the energy landscape that are explored during folding or unfolding are 
not probed. 

The advantages of P(T,t) over the standard distribution of unfolding times P(t) are two- 
fold. First, P(T,t) is computed by averaging over well- characterized fully stretched states. This 
eliminates the problem of not knowing the distribution of initial protein states encountered in 
current experiments. Indeed, due to intrinsic heterogeneity of the protein folding pathways, 
after the first unfolding event the protein may or may not refold into the native conformation, 
which creates the initial state ambiguity in the next (second, third, etc) pulling cycle. Therefore, 
statistical analysis based on averaging over final (stretched) states rather than initial (folded) 
states allows to overcome this difficulty Secondly, statistical analysis of unfolding data performed 
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for different values of T allows us to separately probe the kinetics of native interactions and the 
dynamics of X. In addition, the entire energy landscape of native interactions can be mapped 
out when stretch-quench cycles are repeated for several values of f^, fg, and T. 

Regime I (T<^r F ): In the simplest unfolding scenario application of results in the disruption 
of the native contacts (F— >{C}) followed by stretching of the manifold {C} into U (Figure 2). 
When stretching cycles are separated by short T compared to the protein folding timescale 
r F at f<2=0, P(T; t) is determined by the evolution of the coil state. Then the unfolded state 
population pu{T; t) is given by the convolution of protein relaxation (over time T) from the fully 
stretched state Xjjk-L to an intermediate coiled state X\ and streching X\ into final state Xf 
over time t. Thus, P(T; t) is obtained from pu(T; t) by taking the derivative with respect to t, 

P(T<^r F ;t) = ^- Pu (T<^r F ;t) (1) 
at 

L r L 



d 1 



pLt pL, pLi 

/ dX f 4nXj / dX^nXf / dXuAnX* 
Jl-8 Jo Jo 



dt N(T) 

x G s (X f ,t;X l )G Q (X h T;X u )P(X u ) 

where N(T) is T-dependent normalization constant obtained by taking the last integral in the 
right hand side (rhs) of Eq. (JTJ) from Xf=0 to Xf=L, and P{Xu) is the distribution of unfolded 
states. If X is well controlled, Xy is expected to be centered around a fixed value Xjj and 
P(X v )~6(Xu - X v ). In Eq. G Q (X',t;X) and G s (X',t;X) are respectively, the quenched 
and the stretching force dependent conditional probabilities to be in the coiled state X' at time 
t arriving from state X at time t=0. The integral over Xf is performed in the range [L — 5] L] 
with X=L— 5 (Figure 2) representing unfolding distance at which the total number of native 
contacts Q is at the unfolding threshold, Q~Q*. It follows that P(T;t) (Eq. ((IJ) contains 
information on the dynamics of X. By assuming a model for X and fitting P(T;t), obtained 
by differentiating the integral expression appearing in Eq. (JTJ, to the histogram of unfolding 
times, separated by short T<^t f , we can resolve the dynamics of the polypeptide chain in the 
coil state which allows us to evaluate the fg-dependent coil dynamical timescale using single 
molecule force spectroscopy. The fit of Eq. (0) could be analytical or numerical depending on 
the model of X. 

Regime II (T^>t f ): When stretching cycles are interrupted by long relaxation periods, T^r F , 
the coiled states refold to X F (Figure 2). In this regime, the initial conformations in forced- 
unfolding always reside in the NBA. In this limit, P(T; t) reduces to the standard distribution of 
unfolding times P(T, t)—*P(t). When T^>t f , P(T; t) is given by the convolution of the kinetics 
of rupture of native contacts, resulting in protein extension AX F , and dynamics of X from state 
X F +AX F to final state Xf, 

P(T > r F ; t) = P{t) = - Pu (T > t f ; t) (2) 

dt 



d 



dXfAnXj / dX F A7rX 2 F / dt' 

L-S Jo Jo 



dt N'(T) 

x G s (Xf,t;X F + AX F ,t')P F (t',X F ;f s 



where N'(T) is normalization constant obtained as in Eq. ((!} and P F (t, X F ; fj) is the probability 
of breaking the contacts over time t that stabilize the native state X F . By assuming a model 
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for P F (t,X F ; fg) and employing information on the dynamics of X, obtained from the short T- 
experiment (Eq. ({!)))■ we can probe the disruption kinetics of native interactions. By repeating 
long T-measurements at several values of f$, we can map out the energy landscape of native 
interactions projected on the direction of the end-to-end distance vector. 

Regime III (T~rp): In this limit, some of the molecules reach the NBA, starting from ex- 
tended states (X^iL), whereas others remain in the basin {C}. The fraction of folding events 
Pf depends on T during which X approaches the average extension (Xc) facilitating the forma- 
tion of native contacts. Thus, P(T ~ t f ) obtained in the intermediate T-experiment, involves 
contributions from both {C} and F initial conditions and is given by a superposition, 

P(T ~ r F ; t) = p F (T)P(T > t f - t) + Pc (T)P(T < r F ; t) (3) 

where the probability to arrive to F from {C} at time T is given by 

p F {T) = [ L dX^ixXl [ L dXuATrX'PdT, X; f Q )G Q (X 1 , T; Xu)P(Xu) (4) 
Jo Jo 

and the probability to remain in {C} is pc(T)=l—p F (T). In Eq. (jlj), Pc(T, X; fg) is the refold- 
ing probability determined by kinetics of formation of native contacts. Because the dynamics 
of X is weakly correlated with formation of native contacts, X in Pq is expected to be broadly 
distributed. Therefore, Eqs. © and (@J) can be used to probe kinetics of formation of native 
interactions. 

For Eqs. (JTJ) and (j2J) to be of use, one needs to know the (re)folding timescale r F . The 
simplest way to evaluate t f is to construct a series of histograms P(T n ,t) (n—1, 2, . . ., N) 
for a fixed fs and increasing relaxation time Ti<T 2 <. . .<T/v, and compare P(T„,t)'s with the 
distribution P(T*,t) obtained for sufficiently long T*^>t f . If T=T* then all the molecules are 
guaranteed to reach the NBA. The difference 

D(T n ) = \P(T ni t)-P(T\t)\ (5) 

is expected to be non-zero for T n <r F and should vanish if T n exceeds t f . Statistically, as T n 
starts to exceed t f increasingly more molecules will reach the NBA by forming native contacts. 
Then, more unfolding trajectories will start from folded states, and when T~^>t f all unfolding 
events will originate from the NBA. Therefore, D(T n ) is a sensitive measure for identifying 
the kinetic signatures for forming native contacts. The utility of D(T n ) is that it is a simple 
yet accurate estimator of t f , which can be utilized in practical applications. Indeed, one can 
estimate t f by identifying it with the shortest T n at which P{T n \t)mP{T* ,t), i.e. T n K.r F . We 
should emphasize that to obtain t f from the criterion that D(t f )~0 no assumption about the 
distribution of refolding times have been made. Having evaluated t f one can then use Eqs. (Q) 
and for short and long T-measurements to resolve protein coil dynamics and rupture kinetics 
of native contacts. 

Let us summarize the major steps in the FCS. First, we estimate t f by using D(T) (Eq. 
EJ)). We next probe protein coil dynamics by analyzing P(T r F ;t) obtained from short- 
T-measurements (Eq. In the third step, we use information on protein coil dynamics 

to resolve the kinetics of rupture of native interactions contained in P(T ^> r F ; t) of long-T- 
measurements (Eq. @). Finally, by employing the information on protein coil dynamics and 
kinetics of rupture of native interactions, we resolve the kinetics of formation of native contacts 
by analyzing P(T ~ T F ;t) from intermediate T-measurements (Eqs. (jSJ) and (@J)). 
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The beauty of the proposed framework is that these experiments can be readily performed 
using available technology. In the current AFM experiments, T can be made as short as few 
microseconds. Simple calculations show that the relaxation of a short 50 amino acid protein 
from stretched state with L^l9nm to the coiled states {C} with say, X^2nm, occurs on the 
timescale r^Ax 2 / _D~10/xs, where Ax=L— X^YInm and D^10~ 7 cm 2 /s is the diffusion con- 
stant. Clearly, the time of formation of native contacts, which drives the transition from {C} 
to the NBA, prolongs Tp by few microseconds to few miliseconds or larger, depending on fold- 
ing conditions. In the experimental studies of forced unfolding and force-quenched refolding of 
ubiquitin, r F was found to be of the order of 10— 100ms [24]. Computer simulation studies of 



unzipping-rezipping transitions in short 22-nt RNA hairpin P5GA have predicted that Tp is of 
the order of few hundreds of microseconds (35[. 

Model for the kinetics of native contacts: To interpret the data generated by FCS it is useful 
to have a model for the time evolution of the native contacts and X. We first present a simple 
kinetic model for rupture and formation of native contacts represented by probabilities Pp and 
Pc in Eqs. (J2J) and (@J), respectively, and a model for the dynamics of X given by the propagator 
Gs,q{X' ,t; X). To describe the force-dependent evolution of native interactions we adopt the 
continuous-time-random-walk (CTRW) formalism j43L liil liH liH |47| . In the CTRW model, 
a random walker, representing rupture (formation) of native contacts, pauses in the native 
(coiled) state for a time t before making a transition to the coiled (native) state. The waiting 
time distribution is given by the function ty a (t) (a=r or /, where r and / refer to rupture and 
formation of native contacts, respectively). We assume that the probabilities Pp(t, Xp;fs) an d 
Pc{t, Xc\ fg) are separable so that 

Pp(t,Xp;i s ) « P eq (Xp)P r (t;i s ), and P c (t,X c ;f Q ) « P c (X c )Pf(t;f Q ) (6) 

where P eq (Xp) is the equilibrium distribution of native states, Pc(Xc) is the distribution of 
coiled states and P r (t;fs) and P/(t;fg) are the force- dependent probabilities of rupture and 
formation of native contacts, respectively. Factorization in Eq. © implies that application of 
force does not result in the redistribution of states Xp and Xc in the NBA and in the manifold of 
coiled states {C}, but only changes the timescales for NBA— >{C} and {C}-^NBA transitions, 
and thus, the propabilities P r and Pf. We expect the approximation in Eq. (0) to be valid 
provided the rupture of native contacts and refolding events are cooperative. 

During stretching cycles, for is well above f<7, we may neglect the reverse folding process. 
Similarly, global unfolding is negligible during relaxation periods with fQ<fc. Then, the master 
equations for P T {t) is 

jPr{t) = - J dr^ r {r)P r {t-r) (7) 

where & r (t) is the generalized rate for the rupture and formation of native interactions. In the 
Laplace domain, defined by f(z)= J °° dtf(t) exp [—tz], ^ r (t) is related to $ r (t) as 

IV(Z) =Z* r {z) [l-^r(z)}- 1 . (8) 

The structure of the master equation for Pf(t) is identical to Eq. (JJJ) with the relationship 
between $/(t) and ^f{t) being similar to Eq. (JHJ). The general solution to Eqs. (J7J) is 

P r (z) = [z-Q r (z))- 1 P r (p) (9) 
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where P r (0)=l is the initial condition and the solution in the time domain is given by the inverse 
Laplace transform P r (t)=L~ 1 {P r (z)} . The solution for Pf(z) is obtained in a similar fashion 
(see Eq. ©) with initial condition Py(0)=l. 

Model for the polypeptide chain: In the extended state, when the majority of native inter- 
actions that stabilize the folded state are disrupted, the molecule can be treated roughly as a 
fluctuating coil. Simulations and analysis of native structures (38| suggest that proteins behave 
as worm-like chains (WLC). For convenience we use a continuous WLC description for the coil 
state whose Hamiltonian is 



H 



+ 



3k B T 
2,1 p 

3k b T 
4 



-L/2 V ds 



+ 



3lpkgT 



1/2 * 



ds 



+ 



dr(-L/2,t) 
ds 



-L/2 
2 



(10) 



1/2 is f *M 

-L/2 V OS 



where l p is the protein coil persistence length. A large number of force-extension curves obtained 
using mechanical unfolding experiments in proteins, DNA, and RNA have been analyzed using 
WLC model. In Eq. (JTUJ) the three-dimensional Cartesian vector r(s,t) represents the spatial 
location of the s th "protein monomer" at time t. The first two terms describe chain connectivity 
and bending energy, respectively. The third term represents fluctuations of the chain free ends 
and the fourth term corresponds to coupling of r to fsQ- The end-to-end vector is computed as 
X(t)=r(L/2,t)-r(-L/2,t). ^ 

We need a dynamical model in which X is represented by the propagator G(X,t; X ). Al- 
though bond vectors of a WLC chain are correlated, the statistics of X can be represented by a 
large number of independent modes. It is therefore reasonable, at least in the large L limit, to 
describe Gs t Q(X,t;X ) by a Gaussian, 



Gs,q(X, t; X 



3/2 



M&) M ) (i-4q(*)) 3/2 



exp 



3(X - 05, Q (t)X o ) 2 
2(X 2 ) s , Q (l-<p 2 (t)) 



specified by the second moment (X 2 )s,q and the normalized correlation function 
6 (t) fio — (X (t)X(0))fi s Q/ (X 2 ) Si q. Calculations of (X 2 ) Si q and 4>(t)s,Q are given in the Appendix 
48, 491. In the absence of force, we obtain: 



oo 1 

(X(t)X(0)) = 12k B Tj2-rt(Lm 



n 



l,3,...,2g+l 



(12) 



where ip n (X) and z n are the eigenfunctions and eigenvalues of the modes of the opera- 
tor that describe the dynamics of r(s, t) (see Eq. ()Aip ). To construct the propagator 
Gs,Q{X,t; X ) for fs,Q, Eq. (jAlj) is integrated with fs,Q added to random force. We obtain: 
(X 2 ) St Q=(X 2 ) +ig Q J2™ = i 4> 2 (L/2)/z 2 , where n — 1, 3, . . . , 2q + l. We analyze the distributions 
of unfolding times P(T, t) for the model sequence 51 (Figure 3) obtained using simulations, 
CTRW model for evolution of native interactions (Eqs. ©-©) and Gaussian statistics of the 
protein coil (Eq. (JUJ)). 

Simulations of model (3-sheet protein: The usefulness of FCS is illustrated by computing 
and analyzing the distribution function P(T; t) for a model polypeptide chain with /3-sheet 
architecture. Sequence 51, which is a variant of an off-lattice model introduced sometime ago 
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[39j, is a coarse-grained model (CGM) of a polypeptide chain, in which each amino acid is 
substituted with a united atom of appropriate mass and diameter at the position of the C a - 
carbons ji^, 41 1. The SI sequence is modeled as a chain of 46 connected beads of three types, 



hydrophobic B, hydrophilic L, and neutral N, with the contour length L = 46a, where a~3.8A 
is the distance between two consequtive C a - carbon atoms. The coordinate of j-th residue is 
given by the vector Xj with j=l, 2, . . ., N. 

The potential energy U of a chain conformation is 

U = Ubond + Ubend + Uda + U n b, (13) 

where Ubond, Ubend, Uda are the energy terms, which determine local protein structure, and 
U n b corresponds to non-local (non-bonded) interactions. The bond-length potential Ubond, that 
describes the chain connectivity, is given by a harmonic function 

k N ~ X 

3=1 

where fcf,=100e/ l /a 2 and (~ 1.25kcal/mol) is the energy unit roughly equal to the free energy 
of a hydropobic contact. The bending potential Ubend is 

N ~ 2 b 

Ubend=J2i^- 6 ^ 2 > (15) 

where ke=20eh/rad 2 and 6 , o=105°. The dihedral angle potential Uda, which is largely responsible 
for maintaining protein-like secondary structure, is taken to be 

N-3 

V da = J2iM 1 + COS <j>i)+Bi(l + COS 30;)], (16) 

%=l 

where the coefficients A4 and Bi are sequence dependent. Along the /3-strands trans-states 
are preferred and A=B=\.2e n - In the turn regions (i.e. in the vicinity of a cluster of N 
residues) A=0, B=0.2e n . The non-bonded 12-6 Lennard- Jones interaction U n b between hy- 
drophobic residues is the sum of pairwise energies 

U nh = J2 U ^ 

i<j+2 

where Uij depend on the nature of the residues. The double summation in Eq. f|17|) runs over 
all possible pairs excluding the nearest neighbor residues. The potential U^ B between a pair of 

r , \ 12 , s el 

BB 1 



hydrophobic residues B is given by U$ {r)=AXe 



, where A is a random factor 

unique for each pair of B residues j4l| and r=|x— x 3 -|. For all other pairs of residues U^ is 
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[r- 









repulsive 

Although an off-lattice CGM drastically simplifies the polypeptide chain structure, it does 
retain important charateristics of proteins, such as chain connectivity and the heterogeneity 
of contact interactions. The local energy terms in SI provide accurate representation of the 
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protein topology. The native structure of SI is a /3-sheet protein that has a topology similar 
to the much studied immunoglobulin domains (Figure 3). When the model sequence is subject 
to fs or fg, the total energy is written as U to t=U— f a X (a=S or Q), where X is the protein 
end-to-end vector, and fs,Q=(fs,Q, 0, 0) is applied along the x-direction (Figure 1). 

The dynamics of the polypeptide chain is assumed to be given by the overdamped Langevin 
equation, which in the absence of fs or fg, is 

d dUtot / \ / n 

*** = -£ + «® («) 

where r\ is the friction coefficient and gj{t) is a Gaussian white noise, with the statistics 

( Sj (t)) = 0, (gi(t) gj (t')) = GksTvSijSit - (19) 

Eqs. f!18j1 are integrated with a step size St=0.02rL, where r^=(ma 2 / 'e/ s ) 1//2 =3ps is the unit of 
time and msi3x 10~ 22 g is a residue mass. In Eq. (JT5j) the value of r]=50m/T L corresponds 
roughly to water viscosity. 



III. RESULTS 

Simulations of unfolding and refolding of SI: For the model sequence SI we have previ- 
ously shown that the equilibrium critical unfolding force is fc~22.6pN [40] at the temperature 
T s =0.692e/j/&B below the folding transition temperature T F =0. 7eh/ kR. At this temperature 
70% of native contacts are formed (see the phase diagram in Ref. |40j]). To simulate the stretch- 
relax trajectories, the initially folded structures in the NBA were equilibrated for 60ns at T s . To 
probe forced unfolding of SI at T=T S , constant pulling force fs=40pN and 80pN was applied 
to both terminals of SI. For these values of fs, SI globally unfolds in t=90ps and 50ps, respec- 
tively. Cycles of stretching were interrupted by relaxation intervals during which the force is 
abruptly quenched to /q=0 for various duration T. Unfolding-refolding trajectories of SI have 
been recorded as time series of X and the number of native contacts Q. 

In Figure 4 we present a single unfolding-refolding trajectory of X and Q of SI, generated 
by stretch-relax cycles. Stretching cycles of constant force fs=80pN applied for 30ns are inter- 
rupted by periods of quenched force relaxed over 90ns. A folding event is registered if it results 
in the formation of 92% of the total number of native contacts Qp—106, i.e. Q>0.92Qp for the 
first time. An unfolding time is defined as the time of rupture of 92% of all possible contacts for 
the first time. With this definition, the unfolded state end-to-end distance is X>Xu^36a. In 
Figure 4, folded (unfolded) states correspond to minimal (maximal) X and maximal (minimal) 
Q. Inspection of Figure 4 shows that refolding events are essentially stochastic. Out of 36 re- 
laxation periods only 9 attempts resulted in refolding of SI. Both X and Q show that refolding 
of SI occurs though an initial collapse to a coiled state with the end-to-end distance Xc/a~15 
(Q~20), followed by the establishment of additional native contacts (Q~90) stabilizing the 
folded state with Xp/a^(l — 2). 

We generated about 1200 single unfolding-refolding trajectories and monitored the time- 
dependent behavior of X and Q. In the first set of simulations we set /s=40piV and used 
several values of T=24, 54, 102, 150 and 240ns. In the second set, fs=80pN, and T=15, 48, 
86, 120 and 180ns. Each trajectory involves four stretching cycles separated by three relaxation 
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intervals in which /q=0. Typical unfolding-refolding trajectories of X and Q for fs=4:0pN, 
/q=0, and T=102, 150 and 240ns are displayed in Figures 5, 6 and 7, respectively. Due to 
finite duration of stretching cycles (90ns), unfolding of SI failed in few cases which were not 
included in the subsequent analysis of unfolding times. Only first stretching cycles in each 
trajectory are guaranteed to start from the NBA and for T=102ns (Figure 5) relatively few 
relaxation intervals result in refolding (with large Q). This implies that the distribution of 
unfolding times P(T,t) obtained from these trajectories are dominated by contributions from 
the coiled states with the kinetics of formation of the native contacts playing only a minor role. 
Not unexpectedly, refolding events are more frequent when T is increased to 150ns and 240ns. 
At T=150ns, Q reaches higher values (~65 — 75) and the failure to refold is rare (Figure 6). This 
implies that as T starts to exceed the (re)folding time tf, the distribution of unfolding events, 
parametrized by P(T,t), is characterized by diminishing contribution from the coiled states 
{C} and is increasingly dominated by the folded conformations in the NBA. Note that failed 
refolding events are observed even at T=240ns (Figure 7), which implies large heterogeneity in 
the duration of folding barrier crossing events. Figures 5-7 suggest that the folding time tf at 
the temperature Tjj is in the range 100— 240ns. Direct computations of the folding time t> from 
hundreds of folding trajectories starting with the fully stretched states gives 7>~176ns. The 
agreement between f F and t f validates our stretch-release simulations. 

Analysis of the distribution of unfolding times of SI: The theoretical considerations in our 
formalism suggest that the T-dependent heterogeneous unfolding processes occur not only from 
the NBA but also from the intermediate coil {C} states. The T-dependent protein dynamics 
can be utilized to separately probe the coil dynamics of the polypeptide chain and the kinetics 
of formation/rupture of native contacts (Q). We now utilize unfolding-refolding trajectories of 
SI, simulated for short, intermediate and long T, to build the histograms of unfolding times 
P(T,t). Using P(T,t) we provide quantitative description of the polypeptide chain dynamics 
in the coil state and the kinetics of rupture and formation of native interactions by employing 
CTRW model for Q and Gaussian statistics for X. 

We computed P(T, t) using the distribution of unfolding times obtained for fs=80pN, T=15, 
48 and 86ns (Figure 8), and fs=4:0pN, T=24, 54 and 102ns (Figure 9). In both cases /q=0. 
We excluded unfolding times corresponding to the first stretch-quench cycle of each trajectory 
which were used to construct P(t) for the purposes of comparing P(t) with P(T,t) for long 
T. Single peaked P(T,t) obtained for T=15ns (Figure 8) and T=24ns (Figure 9), represent 
contributions to SI unfolding from coil manifold {C} alone. When T is increased to 48ns (Figure 
8) and 54ns (Figure 9), position of the peak shifts to longer times, i.e. from t^2.5ns to t~5ns 
(Figure 8) and from t^Qns to tmlOns (Figure 9). Furthermore, P(T,t) develops a shoulder at 
tmlOns and t^25ns, observed for T=86ns (Figure 8) and T— 102ns (Figure 9), which indicates 
a growing (with T) contribution to unfolding from relaxation trajectories that reach the NBA. 
At longer T— 150ns, when most relaxation periods result in refolding of SI, contribution from 
coiled states diminishes and at T=240ns P(T,t) is identical to the standard distribution P(t) 
constructed from unfolding times of the first stretch-quench cycle of each trajectory. This implies 
that for f Q =0, r F ^240ns and that P(T,t)-*P(t) for T>240ns. The distribution P(T,t)=P(t) 
constructed from unfolding times separated by T=300ns is presented in Figures 8 and 9 (top 
left panel). 

We use the CTRW formalism to analyze the histograms of unfolding times P(T, t) from which 
the parameters that characterize the energy landscape of SI can be mapped. We describe the 
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kinetics of rupture and formation of native contacts by the waiting time distributions \l/ r , 

# P (t) = N r t Vr ~ x e- kr \ = Nfft^e-W (20) 

where fc r (dependent on / 5 ) and fc/ (dependent on /q) are the rates of rupture and formation of 
native interactions, respectively, N r j=k r j/T(v r j) are normalization constants (T(x) is Gamma 
function) and v r j>l are phenomenological parameters quantifying the deviations of the kinetics 
from a Poissonian process. For instance, v r j=l implies Poissonian process and corresponds to 
standard chemical kinetics with constant rate k r j. We assume that both the folded and the 
unfolded states are sharply distributed around the mean native and unfolded end-to-end distance 
(Xf) and (Xu), respectively (Figure 2), 

P eq (X F ) = 6(X-(X F )), and P{X V ) = 8(X - (Xu)) (21) 

where (Xu) / a=36 residues corresponds to the definition of unfolded state. For SI the contour 
length L/a=46. Thus, SI is unfolded if X/a exceeds (Xu) which implies 5/a=10 residues (see 
Figure 2 and the lower limit of integration in Eq. (fT|)). We describe the distribution of states 
{C} before the transition to the NBA by a Gaussian, 

P C (X) = e Hx-{x c )f,2Axl (22) 

with the width AXc, centered around the average distance (Xc)- 

We performed numerical fits of the histograms presented in Figures 8 and 9 using Eqs. (JTJ, 
(J2J), (jUJ) and ®. By fitting the theoretical curves to P(T,t) constructed from short T=15ns 
and T=48ns simulations (Figure 8) and T=24ns and T=54ns (Figure 9), we first studied 
the dynamics of X to estimate the dynamical timescale r^, i.e. the longest relaxation time 
corresponding to the smallest eigenvalue z n (Eq. (|12|)). and persistence length l p of SI in 
the coil states {C}. By using the values of and l p , we used our theory to describe P(T,t) 
constructed from long T=300ns simulations. This analysis allows us to estimate the parameters 
characterizing the rupture of native contacts k r , v r , (Xf) and AX F . Finally, the parameters kf, 
vj, (Xc) and AXc, characterizing formation of native contacts were estimated using r^, l p , k r , 
v r , (Xf) and AX F , and fitting Eqs. (J3J) and (HJ) to P(T,t) for intermediate T=86ns (Figure 8) 
and T=102ns (Figure 9). 

Extracting the energy landscape parameters of SI: There are a number of parameters that 
characterize the energy landscape and the dynamics of the major components in the NBA^U 
transition. The numerical values of the model parameters are summarized in the Table. The 
values of v r =6.9 for fs=4QpN and v r —5.1 for fs=80pN indicate that rupture of native contacts 
is highly cooperative especially at the lower fs=4:0pN. This agrees with the previous findings 
on kinetics of forced unfolding of SI |40j which were based solely on unfolding SI by applying a 
constant force. In contrast, the formation of native contacts is characterized by implying 
almost Poissonian distribution for the kinetics of formation of native contacts. The structural 
characteristics of the coil states are obtained using the relaxation of the polypeptide chain upon 
force-quench from stretched states. The value of the persistence length l p , which should be 
independent of fg provided fg/f(7<Cl, is found to be about 4.8A (Table). This value is in accord 
with the results of the recent experimental measurements based on kinetics of loop formation 
in denatured states of proteins [501 ] . 

Upon rupture of native contacts, the chain extends by AXp/a=6A (for fs=4:0pN) and 
AXp/a=6.7 (for fs=80pN). This distance separates the basins of folded states with (Xp) /a=4.5 
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at fs=4:0pN and (Xp) /a— 4.6 at fs=80pN from high free energy states when the polypeptide 
chain is stretched in the direction of f$ (Figure 2(a)). Because these high free energy states 
are never populated we expect that forced-unfolding of SI must occur in an apparent two-step 
manner when T-^oo. Explicit simulations of SI unfolding at constant f$ (~69piV) shows that 
mechanical unfolding occurs in a single step (see Figure 2 in Ref. |4o|). 

From the refolding free energy profile upon force-quench (see Figure 2(b)) we infer that 
the initial stretched conformation must collapse to an ensemble of compact structures {C}. 
From the analysis of P(T; t) using the CTRW formalism we find that the average end-to-end 
distance (X c ) for the manifold {C} is close to (Xp) (see the Table) which suggests that the 
ensemble of the {C}^ NBA transition states is close to the native state. There is a broad 
distribution of coiled states {C} which is manifested in the large width AXc/a=2.2. Due to the 
broad conformational distribution, there is substantial heterogeneity in the refolding pathways. 
This feature is reflected in the long tails in P(T,t) (see Figures 8 and 9). As a result, we 
expect the kinetic transition to be sharp. The estimated timescale (~l//c/) for forming native 
contacts for 5*1 is shorter than the coil dynamical timescale Td (for the values of fs used in the 
simulations). This indicates that the dynamical collapse of SI from the stretched state Xu~L 
and equilibration in the coiled manifold {C} constitites a significant fraction of the total folding 
time (^Td+kj 1 ). From the analysis of folding of SI (P(T;t) at intermediate T) we also infer 
that the transition state ensemble for {C}—>-N must be narrow. 

From the rates of rupture of native contacts k r at the two fs values and assuming Bell model 
for the dependence of k r on fs, 

k r (fs) = k° r e^^ T (23) 

we estimated the force-free rupture rate k^ and the critical extension a at which folded states 
of SI become unstable. We found that k®=8xlO~ A ns~ 1 is negligible compared to the rate of 
formation of native contacts, fc^=0.25ns~ 1 . The location of the transition state of unfolding 
X—(Xp)+a is characterized by a=1.5a^0.03L. The value of a is short compared to AXc 
which is a measure of the width of the {C} manifold. Small a implies that the major barrier to 
unfolding is close to the native conformation. A similar values of a was obtained in the previous 
study of SI by using an entirely different approach • These findings are consistent with AFM 
experiments and computer simulations which show that native structures of proteins 
appear to be "brittle" upon application of mechanical force. 

The parameter Td is an approximate estimate of the collapse time, r c , from the stretched to 
the coiled state. Using direct simulations of the decay of the radius of gyration, R g , starting 
from a rod-like conformation, we obtained r c ^80ns (see Supplementary Information in [53]). 
The value of (^20ns) is in reasonable agreement with the estimate of r c . This exercise 
shows that reliable estimates of timescales of conformational dynamics, which are difficult to 
obtain, can be made using FCS. To ascertain the extent to which the estimate of Ku agrees with 
independent calculations, we obtained the Ku by applying a constant force to unfold SI. The 
value of Ku, obtained by averaging over 200 trajectories, is about 90ns at fs=40pN which is in 
rough accord with Ku~Td+k~ 1 ^70ns. This further validates the efficacy of FCS in obtaining 
the energy landscape of proteins. We also estimated K v from the value of Ku obtained by direct 
simulation and the Bell model. The fs-dependent unfolding rate Ku~Td+k~ l increases with 
in accord with Eq. (J23|) . The prefactor (K v ) is about ten fold smaller than The difference 
may be either due to the failure of the assumption that k®=K v or the breakdown of the Bell 
model |H3 |. 
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IV. DISCUSSION 



In this Section we summarize the main steps for practical implementation of the proposed 
Force Correlation Spectroscopy (FCS) to probe the energy landscape of proteins using forced 
unfolding of proteins. 

Step 1. Evaluating the (re) folding timescale t f : In the first phase of the FCS experiments, one 
needs to collect a series of histograms P(T n , t), n—1, 2, . . ., N of unfolding times for increasing 
relaxation time Ti<T 2 <. . .<T^ by repeated stretch- release experiments. This can be done by 
discarding the first unfolding time t\ in the sequence of recorded unfolding times {ti, t 2 , . . . , £a/} 
for each T n to guarantee that all the unfolding events are generated from the stretched states 
with the distribution P(Xjj) (see Eq. (JJJ)). This is a crucial element of the FCS methodology 
since it enables us to perform the averaging over the final (stretched) states. It is easier to resolve 
experimentally the end-to-end distance X^L, rather than the initial (folded) states in which 
a number of conformations belong to the NBA. The histograms are compared with P(T*,t) 
obtained for sufficiently long T*^>Tp. To ensure that T* exceeds Tp, T* can be as long as few 
tens of minutes. The time at which D(T n ), given by Eq. ©, is equal to zero can then be used 
to estimate Tp. Notice that our estimate of Tp does not hinge on whether P(T — > oo; t)=P(t) is 
Poissonian or not! Clearly, the choice of T* depends on the protein under the study, and prior 
knowledge or bulk measurements of unfolding times observed under the influence of temperature 
jump or denaturing agents can serve as a guide to estimate the order of magnitude of T*. 

Step 2. Resolving the dynamics of the polypeptide chain: To this end we have determined the 
ensemble average (re)folding time, Tp. In the second phase of the FCS, we perform statistical 
analysis of the distribution of unfolding times collected at T<^Tp, i.e. P(T <C Tp; t) (regime I in 
Section II). This allows us to probe the dynamic properties of the polypeptide chain, such as the 
protein persistence length l p and the protein dynamical timescale (see the Table). Indeed, by 
assuming a reasonable model for the conditional probability, G(X',t;X), of the protein end-to- 
end distance and the distribution of the stretched states, P(Xjj), l p and T d can be determined 
from the fit (either analytically or numerically) of the unfolding time distribution, P(T <C Tp; t), 
given by Eq. (JTJ), to the histogram of unfolding times collected for TCXf. To illustrate the utility 
of the FCS, in the present work we assumed a Gaussian profile for Gs,q{X', t; X) (see Eq. (fTTj) ) 
and the worm-like-chain model for the polypeptide chain. The general formulae ((TJ allows for 
the use of more sophisticated models of X, should it become necessary. Recent single molecule 
FRET experiments on proteins 5^, 56|, dsDNA, ssDNA, and RNA 57 show, surprisingly, that 



the characteristics of unfolded states obey worm-like chain models. Moreover, all the data in 
forced unfolding of proteins have been analyzed using WLC models. Thus, the analysis of 
FCS data using WLC dynamics for unfolded polypeptide chains to a large extent is justified. 
Gs{X' ,t; X) and Gq(X' ,t; X) can be "measured" in the current AFM and LOT experiments 
by computing the frequency of occurence of the event X after the forced stretch (f=fs) or force 
quench (f=fg) from the well-controlled partially stretched state X or the fully stretched state 
Xk,L of the chain, respectively, over time t (<^t f ). 

Step 3. Probing the kinetics of rupture of the protein native contacts: Having resolved the 
dynamics of the protein in extension-time regime, where the number of native interactions that 
stabilize the native state is small, we can resolve the kinetics of forced rupture of native interac- 
tions stabilizing the NBA (regime II). In the third part of the FCS we analyze the distribution 
of unfolding times for T^>t>, given by Eq. (J2J). We use the knowledge about the propagator 
Gs(X',t';X,t), appearing in the rhs of Eq. (J2J), obtained in Step 2 to perform analytical or 
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numerical fit of the distribution P(T ^> r F ; t) to the histogram of unfolding times collected for 
T~^>t f . The new information, gathered in Step 3, sheds the light on the kinetics of native inter- 
actions stabilizing the NBA, which is contained in the probability P F (t;fs,X F ) (see Eq. (J2J)). 
For convenience, we used the continuous time random walk (CTRW) model for P F (t; f^, X F ), 
which is summarized in Eqs. ©-©) and the assumption of separability, given by Eqs. (JHJ). 
CTRW reduces to the Poissonian kinetics with the rate constants when the waiting time distri- 
bution function for the rupture of native contacts, \P r (£), is an exponential function of t. The 
CTRW probes the possible deviations of the kinetics of P F (t; f$,X F ) from the Poisson process 
and allows to test different functional forms for ^ r (t). In the simplest implementation of CTRW 
utilized in the present work, ^ r (t) is assumed to be an algebraic function of t, given by Eqs. 
(|20|). which allows us to estimate the rate of rupture of native interactions, k r , and parameter 
v r quantifying the deviations of the rupture kinetics from a Poissonian process. Furthermore, 
by repeating Step 3 for different values of the stretching force, fs, and assuming the Bell model 
for k r (fs), given by Eq. (J23j) . we can also estimate the force-free rupture rate, and the 
critical extension a, which quantifies the distance from the NBA to the transition state along 
the direction of fs- We also obtain the average end-to-end distance in the folded state, (X F ) 
from the distribution of the native states P eq (X F ). 

Step 4- Resolving the kinetics of formation of native interactions: In the final step the 
distributions P(T <C r F ;t) and P(T ^> T F ;t), analyzed in Steps 2 and 3 respectively, are used 
to form a linear superposition (Eq. (jSJ), regime III). The T-dependent weights are given by 
the probabilities pc(T) and p F (T)=l—pc(T), respectively. This superposition is used to fit the 
histogram of unfolding times, P{T ~ t f ; t), collected for T~t>. The estimated probability Pf{T) 
should then be matched with the propbability obtained by performing double integration in Eq. 
(JD). This allows us to probe the kinetics of formation of native contacts, Pc(T; X, fg), for the 
known propagator Gq(X', T; X) analyzed in Step 2. As in the case of P F (t; f s , X F ), we assumed 
separability condition for Pc(t; iq, X c ) (Eqs. (jUJ)) and CTRW for the kinetics of formation of 
native contacts contained in Pf(t; fg) (see Eqs. (JZ))-©). A simple algebraic form for the waiting 
time distribution function, ^/(t), given by Eq. (J2Uj) . allows us to estimate the force-free rate 
of formation of native interactions, /c/(/q = 0)=A;°. Moreover, the heterogeneity of the protein 
folding pathyways can be assesses by analyzing the width, A Ac, of the distribution of coiled 
protein states, Pc(X), centered around the average end-to-end distance, (Xc) (see Eq. (j22J))- 
Similar to the analysis of rupture kinetics, Step 4 could be repeated for the two values of the 
quenched force, /q, to yield the force-free rate of formation of native contacts, stabilizing the 
native fold, and the distance between (Xc) and the transition state for the formation of native 
contacts. For the purposes of illustration, in the present work we used /q=0. 

At the minimum FCS can be used to obtain model- independent estimate of r F . By assuming 
a WLC description for coiled states, which is justified in light of a number of FRET and forced 
unfolding experiments, estimates of collapse times and their distribution as well as persistence 
length can be obtained. If CTRW model is assumed then estimates of timescale for rupture and 
formation of native contacts can be made. The utility of FCS for SI illustrates the efficacy of 
the theory. The potential of obtaining hitherto unavailable information makes FCS extremely 
useful. 
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V. CONCLUSIONS 

In this paper we have developed a theory to describe the role of internal relaxation of polypep- 
tide chains in the dynamics of single molecule force-induced unfolding and force-quench refolding. 
To probe the effect of dynamics of the chain in the compact manifold of states, that are popu- 
lated in the pathways to the NBA starting from the stretched conformations, we propose using 
a series of stretch-release cycles. In this new class of single molecule experiments, referred to 
as force correlation spectroscopy (FCS), the duration of release times (T) is varied. FCS is 
equivalent to conventional mechanical unfolding experiments in the limit T—>oo. By applying 
our theory to a model /5-sheet protein we have shown that the parameters that characterize the 
energy landscape of proteins can be obtained using the joint distribution function of unfolding 
times P{T-t). 

The experimentally controllable parameters are is, fg, and T. In our illustrative example, 
we used values of is that are approximately (2 — 4) times greater than the equilibrium unfolding 
force. We set fQ=0 which is difficult to realize in experiments. From the schematic energy 
landscape in Figure 1 it is clear that the profiles corresponding to the positions of the manifold 
{C}, the dynamics of {C}, and the transition state location and barrier hight depend on fg. 
The simple application, used here for proof of principle purposes only, already illustrates the 
power of FCS. To obtain the energy landscape of SI by using FCS that covers a broader range 
of is and fg, a complete characterization of the landscape can be made. The experiments that 
we propose based on the new theoretical development can be readily performed using presently 
available technology. Indeed, the pioneering experimental setup used by Fernandez and Li |24j 
that have utilized force to initiate refolding can be readily adopted to perform single molecule 
FCS. 

It is known that even for proteins that fold in an apparent two-state manner the energy 
landscape is rough 2^|. The scale of roughness AE can be measured in conventional AFM 



experiments by varying temperature. The extent to which the internal dynamics of proteins is 
affected by AE, whose value is between (2 — 5)k B T 0, on the force- quenched refolding 



is hard to predict. These subtle effects of the energy landscape can be resolved (in principle) 
using FCS in which temperature is also varied. 
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APPENDIX A: CALCULATION OF (X(t)X(0)) 

In this Appendix we outline the calculation of (X(t)X(0)) and (A 2 ) for the force-free prop- 
agator Gq(X, t;X Q ). By using Eq. (fTUj) (without the last term) and applying the least ac- 
tion principle to WLC Lagrangian L=m/2f^^ 2 ds(dr/dt) 2 —H, we obtain: mj^-r(s, t)+ejpr 

r(s,t)—2v-7^2 r(s,t)=0, where m is the protein segment mass and e=3l p kBT/A, z/=3/cbT/2Z p . 
Dynamics of the media is taken into account by including a stochastic force f{s,t) with the 
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white noise statistics, (f a (s,t))=0, (f a (s, t)fp(s', t'))— 2^k B T5 a p5{s — s')6(t — t'), where a—x, 
y, z, and 7 is the friction coefficient per unit coil length. In the overdamped limit, the equation 
of motion for r(s, t) is jiH |4£| 



d <9 4 d 2 



with the boundary conditions, 



d d 3 
2 ^ r(M) ~ e ^ r(M) 



0. 



±L/2 



d d 2 
2u —r(s,t) + e—r(s,t) 







(Al) 



(A2) 



±L/2 



where UQ=ZksT /A. We solve Eq. (jAl|) by expanding r(s,t) and f(s, t) in a complete set of 
orthonormal eigenfunstions {^ n (s)}, i.e. 



r(s,t) = ^(„(i)i(s) and f (s, t) = f n (t)^ n (s) 

n=0 n=0 

Substituting Eqs. ()A3|) into Eq. ()A1|) and separating variables we obtain: 

e ^Z^nO) - 2v—ip n (s) = z n ip n {s) and 7^fn(*) + ^n£n(*) = f„(£) 
where z n is the n-th eigenvalue. The second Eq. (|A4|) for is solved by 



(A3) 



£n(0 = - / dt%(t')exp 

7 



7 



(A4) 



(A5) 



and the eigenfunctions ip n {s) are 



^0 






= a/c„/L 




= a/c„/L 



(A6) 



cos [a n L/2] 



a, 



sin a n s + 



cos [a n s] 



0„ 



cosh[/3 n L/2] 

/3 n 



sinh [P n s] J , n = 1, 3, . . . , 2q + 1 
cosh [/3 n s] I ,n = 2,4, . . . ,2g 



sin [a n L/2] L J sinh [/^L/2] 
where the normalization constants; a n and /3„ are determined from Eqs. (jA2j) 

a n sin [a n L/2] cosh [/3 n L/2] - (3 3 n cos [a n L/2] sinh [/3 n L/2] 
- f(«n + /?«) cos [a n L/2] cosh [/3 n L/2] = 0, n = 1, 3, . . . , 2q + 1 
a n cos [a n L/2] sinh [(3 n L/2\ + sin [a n L/2] cosh [/3 n L/2] 
+ T (a 2 n + (3 2 n ) sin [a n L/2] sinh [/3„L/2] = 0, n = 2, 4, . . . , 2q 



(A7) 



The parameters a n and (3 n are related as /3 2 — a 2 



i. The eigenvalues z n are 



given by 2 n =ea 4 +2i/a 2 . Using Eqs. (jA3|) and ()A5|) . we obtain: (r(s,t)r(s',t)) 
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^BTJ2n=oiMs)Ms')e-^. Then, (X{t)X(0))= (r(| , t)r(§ , 0))+ (r(-f , t)r(-§, 0)>- 
(r(f,t)r(-§,0)>- (r(-f,t)r(§,0)>, which yields Eq. (HJ. 
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TABLE I: Energy landscape parameters for SI extracted from FCS: 



fs, pN a 


l P /a b 


Td, ns c 


k r , l/ns d 


v r e 


(X F )/af 


AXp/a 9 


kf, l/ns h 


— , i 

v f 


(X c )/aP 




40 


1.2 


19.6 


0.02 


6.9 


4.5 


6.4 


0.26 


l.i 


4.8 


2.2 


80 


1.1 


15.2 


0.11 


5.1 


4.6 


6.7 


0.25 


l.i 


4.7 


2.2 



a fs is the magnitude of the stretching force 

b l p is the persistence length of 51 in the coiled state (Eq. iJTJJJ) measured in units of a 
c rd is the /Q-dependent longest relaxation time in the coil state (Eq. I|12|l) 

d k r (kf) is the rate of rupture (formation) of native interactions (Eq. J2DJ0 and is a function of fs (/q) 
e v r (vf) quantifies deviations of the native contacts rupture (formation) kinetics from the Poisson process 
f (X F ) {{X c }) is the average end-to-end distance of 51 in the NBA (manifold {C}) (Figure 2(b), Eqs. JS)-©) 
9 AXp is the extension of the chain prior to rupture of all native contacts (Figure 2(a) and Eq. J2J) 
h k r (kf) is the rate of rupture (formation) of native interactions (Eq. J3UJ) and is a function of fs (fs) 
l v r (vf) quantifies deviations of the native contacts rupture (formation) kinetics from the Poisson process 
j (X F ) {(X c )) is the average end-to-end distance of 51 in the NBA (manifold {C}) (Figure 2(b), Eqs. (2TJ-J22)) 
k AXc is the width of the distribution of coiled states of 51 (Eq. (|23fl'l. a measure of the refolding heterogeneity 
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FIGURE CAPTIONS 

Figure 1. a: A typical AFM setup: constant force f—fs—fs^ is applied through the can- 
tilever tip linker in the direction x parallel to the protein end-to-end vector X. Stretching 
cycles are interrupted by relaxation intervals T during which the force is quenched, f=fQ=/gx 
(fs>fo)- b: A single trajectory of forced unfolding times ti, ti-, £3, • • •, separated by fixed relax- 
ation time T, during which the unfolded protein can either collapse into the manifold of colied 
states {C} if T is short or reach the native basin of attraction (NBA) if T is long. 

Figure 2. Schematic of the free energy profile of a protein (red) upon stretching at constant 
force fs and force-quench fg. (a): The projections of energy landscape (blue) is in the direction 
of X which is a suitable reaction coordinate for unfolding induced by force £5. The average 
end-to-end distance in the native basin of attraction is (X F ). Upon application of fs, rupture of 
contacts that stabilize the folded state F results in the formation of an ensemble of high energy 
extended (by AXp) conformations {/}. Subsequently, transitions to globally unfolded state U 
(with L — 5<X<L) occurs, (b): Free energy profile for force-quench refolding which occurs in 
the order U^{C}^F. Refolding is initiated by quenching the force fs— >f<2<fc, where fc is the 
equilibrium critical force needed to unfold the native protein. The initial event in the process 
is the formation of an ensemble of compact structures. The mean end-to-end distance of {C} 
is (X c ) and the width is AX C which is a measure of heterogeneity of the refolding pathways. 
These states may or may not end up in the native basin of attraction (NBA) depending on the 
duration of T. We have used X as a reaction coordinate during force-quench for purposes of 
illustration only. 

Figure 3. Native structure of the model protein SI. The model polypeptide chain has a 
/9-sheet architecture of the native state. The /3-strands of the model chain are formed by native 
contacts between hydrophobic residues (given by blue balls). The hydrophilic residues are shown 
by red balls and the residues forming the turn regions are given in grey. 

Figure 4. A single unfolding-refolding trajectory of the end-to-end distance X/a (black) 
and the total number of native contacts Q (red) as a function of time t for SI. The trajectory 
is obtained by repeated application of stretch-quench cycles with stretching force fs=80pN and 
quenched force /q=0. The duration of streching cycle and relaxation period is 30ns and 90ns, 
respectively. The first five unfolding events corresponding to large X/a and small Q are marked 
explicitely by numbers 1, 2, 3, 4 and 5. Force stretch and force quench for the stretch-quench 
cycles 13, 14, 15, 16 and 17 (middle panel) are denoted by solid green and dash. dotted blue 
arrows. 

Figure 5. Typical unfolding-refolding trajectories of X/a (black) and Q (red) for SI as func- 
tions of time t, simulated by applying four stretch-quench cycles at the pulling force f$=4:0pN 
and quenched force /q=0. The duration of relaxation time T=102ns. 

Figure 6. Examples of unfolding-refolding trajectories of X/a (black) and Q (red) for SI 
as a function of time t. The pulling force is fs=4:0pN and the quenched force is /q=0. The 
duration of relaxation time T— 150ns. 

Figure 7. Same as Figure 6 except T=240ns. 
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Figure 8. Histograms of forced unfolding times P(t) and the joint distributions of unfolding 
times separated by relaxation periods of the quenched force P(T,t). The distribution functions 
are constructed from single unfolding-refolding trajectories of SI simulated in stretch-quench 
cycles of fs=80pN and /q=0 for T=15ns, 48ns and 86ns. Simulated distributions are shown 
by red bars with the contribution to global unfolding events from coiled conformations {C} 
indicated by an arrow for T=86ns. The results of the numerical fits obtained by using Eqs. 
(HJ-(HJ) are represented by solid lines. The energy landscape parameters of SI are summarized 
in the Table. 

Figure 9. Histograms of forced unfolding times P(t) and P(T, t) constructed from single 
unfolding-refolding trajectories for Si. The stretch-quench cycles were simulated with f$=4:0pN 
and /q=0 for T=24ns, 54ns and 102ns. Simulated distributions are shown by red bars with the 
contribution to global unfolding events from coiled conformations {C} indicated by an arrow 
for T=102ns. The results of numerical fit obtained by using Eqs. (HJ-fjU) are represented by 
solid lines. The values of the parameters are given in the Table. 



